Comprehensive pan-cancer investigation of carnosine dipeptidase 1 and its prospective prognostic significance in hepatocellular carcinoma

Abstract Carnosine dipeptidase 1 (CNDP1), an enzyme integral to the hydrolysis of dipeptides containing histidine, plays an indispensable role in myriad physiological processes, including hydrolysis of proteins, maturation of specific biochemical functionalities within proteins, tissue regeneration, and regulation of cell cycle. However, the implications of CNDP1 in oncogenesis and its prognostic value are not yet fully elucidated. Initially, we procured the GSE40367 dataset from the Gene Expression Omnibus and established a protein–protein interaction network. Thereafter, we conducted functional and pathway enrichment analyses utilizing GO, KEGG, and GSEA. Moreover, we undertook an association analysis concerning the expression of CNDP1 with immune infiltration, along with survival analysis across various cancers and specifically in hepatocellular carcinoma (HCC). Our study uncovered a total of 2,248 differentially expressed genes, with a down-regulation of CNDP1 in HCC and other cancers. Our explorations into the relationship between CNDP1 and immune infiltration disclosed a negative correlation between CNDP1 expression and the presence of immune cells in HCC. Survival analyses revealed that diminished expression of CNDP1 correlates with an adverse prognosis in HCC and several other types of cancer. These observations intimate that CNDP1 holds promise as a novel prognostic biomarker for both pan-cancer and HCC.


Introduction
According to the most recent data from the World Health Organization (WHO), approximately 1,996 million new cancer cases were recorded globally in 2022, underscoring cancer's persistent status as a formidable public health challenge.Notably, hepatocellular carcinoma (HCC) is ranked as the eighth most common cancer worldwide and emerges as the third leading cause of cancer-related mortality [1].Given the complex mechanisms of tumorigenesis, it is imperative to undertake a thorough investigation of gene expression across diverse cancer types to discern their correlation with clinical outcomes and the underlying molecular mechanisms.Surgical resection and liver transplantation represent therapeutic strategies aimed at curing HCC [2].Despite these interventions, a considerable proportion of individuals diagnosed with HCC encounter the recurrence of the tumor within a 5-year timeframe [3].Consequently, there is an urgent need to develop innovative prognostic assessment tools to better predict the clinical outlook of individuals diagnosed with HCC and across various cancers.The establishment of a prognostic model to predict survival probabilities and stratify patient outcomes is of paramount importance.
Numerous biomarkers, including alpha-fetoprotein (AFP), AFP-L3, and DCP; as well as GPC3, HSP70, and SCCA, have been extensively explored as potential indicators for the diagnosis and prognosis of HCC [4][5][6][7].Additionally, a plethora of molecular mechanisms are implicated in the onset and advancement of HCC, involving mutations in genes like TP53, CTNNB1, and AXIN1, and the disruption of signaling pathways including the Wnt/β-catenin pathway and the PI3K/AKT/mTOR pathway [8][9][10].A deeper comprehension of these molecular dynamics is instrumental in forging pathways toward the creation of targeted treatments for HCC.
Carnosine dipeptidase 1 (CNDP1) is a gene responsible for encoding proteins, situated on chromosome 18q22.3.The resultant protein, CNDP1, with a molecular weight of 56.8 kDa, is predominantly expressed in cerebral tissues and constitutes a homodimeric dipeptidase, recognized as human carnosinase [11].The gene features a trinucleotide (CTG) repeat length polymorphism within its coding sequence [12].CNDP1 functions both as a secreted and intracellular entity, localized externally to cell membranes [11].
Recent findings indicate that CNDP1 is a molecule conspicuously downregulated in various cancer forms, including HCC.As a proteolytic enzyme, it specializes in cleaving histidine-containing dipeptides and is integral to numerous physiological functions, including protein degradation, maturation of specific proteins, tissue restoration, and cellular cycle regulation [13].
Previous studies have noted certain correlations between CNDP1 and specific cancer types [14][15][16][17].However, these correlations have not been definitively established across all forms of cancer.The link between CNDP1 and cancer seems not to be universally applicable, but rather appears to be contingent upon a variety of factors, including the type of tumor, its developmental stage, and individual patient variations.Consequently, it is essential that we conduct an exhaustive investigation into the role and efficacy of CNDP1 in a pan-cancer context.Only through such a meticulous approach can we accurately determine whether CNDP1 presents a viable new target for cancer therapy, thus offering renewed hope to those afflicted by this disease.
In this study, we scrutinized GSE40367 dataset acquired from Gene Expression Omnibus (GEO).We identified prevalent differentially expressed genes (DEGs) within the dataset and executed protein-protein interaction (PPI), as well as functional and pathway enrichment analyses.Remarkably, CNDP1 was discerned as one of the top three genes with down-regulated expression, showing significant differential alterations, yet its association with HCC had not been documented in previous research.Consequently, CNDP1 was designated as the focal gene for this study.Our results suggest that the expression of CNDP1 is intricately connected to the immune response and holds considerable promise as a valuable prognostic biomarker for various malignancies, including HCC.

Patients of study
Between October 2021 and June 2022, a cohort of 75 patients diagnosed with HCC was enrolled at the Fifth Medical Center of Chinese PLA General Hospital.The diagnosis adhered to the 2019 WHO classification guidelines for digestive system tumors [18], and were corroborated by two independent pathologists.

Expression profile dataset selection
Datasets pertinent to mRNA associated with HCC were meticulously selected through an exhaustive search of the publicly available GEO dataset portal on NCBI (https:// www.ncbi.nlm.nih.gov/geo/), using the search terms "HCC" and "Homo sapiens."Within the array of datasets, GSE40367, contributed by Roessler et al., was pinpointed.This particular dataset comprises 61 samples, encompassing colon adenocarcinoma, liver hemangioma, HCC, cholangiocarcinomas, and angiosarcoma.For our analysis, we selected five liver hemangioma samples to serve as normal liver controls alongside 32 HCC samples.The GSE40367 dataset is predicated on the GPL570 platform, employing the Affymetrix Human Genome U133 Plus 2.0 Array [19].

DEGs identification
Using the limma software package, we conducted an analysis on datasetGSE40367 to identify DEGs.The methodology incorporated adjusted P (adj.P) alongside the Benjamini and Hochberg false discovery rate, thus ensuring a meticulous balance between the identification of statistically significant genes and the mitigation of false positives.Probe sets devoid of gene symbols, or those corresponding to multiple gene symbols, were either excluded or consolidated, respectively.DEGs were ascertained using a threshold of |log2FC| >1.0 and adj.P < 0.05, signifying a minimum two-fold change in expression levels between the compared groups, a robust indicator of significant differential expression unlikely to be attributable to mere chance.

PPI network construction
To elucidate the gene connections, the DEGs were incorporated into the Search Tool for the Retrieval of Interacting Genes [STRING (version 12.0)] online platform (http:// string-db.org)to construct the PPI network, adhering to stringent criteria (minimum requisite interaction score: highest confidence 0.900, k-means clustering: number of clusters 3).In this study, STRING facilitated the analysis of the PPI of DEGs among the top 100 in differential expression magnitude.Following this, leveraging the interaction data, the network was crafted and depicted using Cytoscape software (version 3.9.1).A roster of protein intermediaries was procured and subsequently, Cytoscape was employed to delineate and scrutinize PPI networks, considering an interaction score of no less than 0.4 as significant.The PPI network was delineated by filtering pivotal protein expression molecules via the Minimal Common Oncology Data Elements of Cytoscape plug-in.Additionally, the CytoNCA plug-in, adopting a centrality-focused methodology, was utilized to identify hub genes within the PPI networks.All targets were methodically arranged into circles, with a high centrality value denoting a paramount role within the network.

Functional and pathway enrichment analysis
DEGs were analyzed for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichments using the clusterProfiler package in R, as facilitated by the bioinformatics platform (https://www.bioinformatics.com.cn) [20].The GO enrichment analysis is divided into three categories: biological processes (BP), cellular components (CC), and molecular functions (MF).Additionally, gene set enrichment analysis was performed utilizing the R packages "clusterProfiler" and "GSVA," with the selected gene set annotated as (h.all.v7.2.symbols.gmt).The Normalized Enrichment Score (NES) was determined following 1,000 permutations.A gene set was deemed significantly enriched if it met the criteria of |NES| > 1, P < 0.05, and a false discovery rate <0.25.
The results were elegantly visualized in a bubble plot created with the "ggplot2" R package.

Analysis of CNDP1 expression
The Tumor Immune Estimation Resource 2.0 (TIMER2.0,http:// timer.cistrome.org/)leverages high-throughput sequencing data to scrutinize immune cell infiltration within tumor specimens, juxtaposing these findings with those from control normal tissues.The Gene Expression Profilling Interactive Analysis 2 (GEPIA2, http://gepia2.cancer-pku.cn/#index)integrates cutting-edge cancer genomics data to facilitate efficient data mining and dynamic examination of gene expression profiles.Both TIMER2.0 and GEPIA2 were selected to investigate the variances in CNDP1 expression across diverse cancers, including HCC.

Immunohistochemical staining
The differential protein expression levels of CNDP1 in HCC and corresponding normal tissues were obtained from the Human Protein Atlas (HPA, https://www.proteinatlas.org/),encompassing both healthy and oncological tissue samples.These extensive expression profiles are discernible through the examination of tissue specimens.

Biochemical function, intracellular distribution, and structural information of CNDP1
The UniProt repository, an amalgamation of data from several esteemed databases, furnishes exhaustive details concerning the biochemical functionality, intracellular localization, and structural attributes of proteins.Utilizing the terms "CNDP1" and "HUMAN," we pinpointed the protein of interest (Q96KN2 CNDP1 HUMAN).The protein annotation information provided by UniProt, including function, subcellular localization, and structure, is crucial for gaining a deeper understanding of the biological function of CNDP1 and the potential mechanism of its action.

Association analysis of CNDP1 expression with immune cell infiltration in HCC
Tumor-infiltrating lymphocytes have been recognized as independent prognostic indicators for both the status of sentinel lymph nodes and the overall survival rate in cancer patients.The TIMER2.0 database was employed to ascertain the relationship between immune infiltration and CNDP1 expression in HCC [21].P < 0.05 was considered statistically significant.After adjusting for tumor purity using the Spearman correlation coefficient, a P < 0.05 and a Rho > 0 denoted a positive correlation, whereas a P < 0.05 and a Rho < 0 indicated a negative correlation.
2.10 Gene mutation, immuno-infiltration, and methylation analysis in pan-cancer Utilizing the cBioPortal database (http://www.cbioportal.org/) and the Gene Set Cancer Analysis platform (http:// bioinfo.life.hust.edu.cn/GSCA/#/),we conducted an in-depth analysis of the expression of CNDP1, emphasizing variations in gene copy number and methylation processes.The UALCAN database (http://ualcan.path.uab.edu) was instrumental in providing data concerning DNA methylation levels within the promoter of the CNDP1.For the examination of immune correlations, we employed the sophisticated EPIC [22] and CIBE-RSORT [23] algorithms to calculate the Spearman's correlation coefficient, delineating the relationship between CNDP1 expression and immune cell infiltration across various tumors, depicted via a comprehensive heat map.Furthermore, we explored the association between CNDP1 expression and tumor mutational burden (TMB) within the TCGA cohort, employing the "maftools" R package for analysis.The correlation of CNDP1 expression with TMB across different cancer types was meticulously assessed using the Spearman method, with findings eloquently presented through both heat map and radar map visualizations.

Prognostic analysis
In this study, we partitioned the survival data of distinct cancers from the TCGA database into cohorts with high and low CNDP1 expression based on median gene expression levels.To ascertain the prognostic relevance of CNDP1 in cancer, we conducted a Kaplan-Meier survival analysis.We employed four clinical metricsoverall survival (OS), disease-free survival (DFS), disease-specific survival (DSS), progression-free interval (PFI), and disease-free interval (DFI) to explore the association between CNDP1 expression levels and patient prognoses.Forest plots for Cox regression analysis were generated using the "forestplot" and "survival" packages in R. We further investigated the correlation between the variation in CNDP1 expression and the prognosis of HCC patients across varying tumor microenvironments.
Hazard ratios with 95% confidence intervals (CI) were calculated, along with log-rank P values.A threshold level of P < 0.05 was considered statistically significant.

Laboratory and MRI as well as histopathologic examination from our own samples
All preoperative routine examination parameters, encompassing laboratory assessments and MRI, were meticulously gathered from the electronic medical record system of the hospital.The imaging diagnostic outcomes were expertly analyzed by two distinguished senior radiologists, whereas the histopathological assessments were performed by two seasoned pathologists.Serum CNDP1 concentrations were quantified employing an enzyme-linked immunosorbent assay (ELISA).For the purpose of measuring serum CNDP1 levels, the ELISA Kit (EK1957, BOSTER, Wuhan, China) was utilized.3 Results

Identification of DEGs in GSE40367 and PPI network construction of DEGs
We conducted an exhaustive analysis of GEO datasets pertaining to patients diagnosed with HCC.Consequently, GSE40367 was identified as the dataset of choice.The information of the samples contained within this dataset facilitated the identification of DEGs (healthy controls [5 samples] versus HCC patients [32 samples]).Employing the limma package, we discerned 1,121 upregulated and 1,127 downregulated DEGs in GSE40367 following Log2 transformation (Figure 1a-c).The top 20 genes exhibiting significant variations are enumerated in Table 1.

Functional and pathway enrichment analysis
To elucidate the roles of DEGs in GSE40367, we conducted a GO enrichment analysis.The findings revealed that the BP predominantly encompass the catabolism of small molecules, organelle fission, chromosome segregation, nuclear division, and the metabolic processing of fatty acids.CC functions were primarily associated with chromosomal regions, condensed chromosomes, spindles, chromosomes, centromeric regions, cytoplasmic vesicle lumens, and vesicle lumens.MF categories notably included activities including monooxygenase, oxidoreductase acting on paired donors with the incorporation or reduction of molecular oxygen, heme binding, lyase activity, and iron ion binding (Figure 2a-c).Subsequent to our analysis, the KEGG enrichment yielded the following insights: cell cycle, fatty acid degradation, drug The protein interaction network derived from the PPI analysis using the cytoNCA plugin in Cytoscape, features circular shapes representing the proteins and lines delineating the interactions between them.metabolism via cytochrome P450, tryptophan metabolism, chemical carcinogenesis through DNA adducts, tyrosine metabolism, pyruvate metabolism, metabolism of xenobiotics by cytochrome P450, drug metabolism by other enzymes, and DNA replication (Figure 2d-f).
Furthermore, we investigated potential signaling pathways through which aberrant CNDP1 expression influences functional states across 33 distinct cancer types via GSEA.We noted a marked enrichment in several signaling pathways, including epithelial-mesenchymal transition, G2/M checkpoint, xenobiotic metabolism, and immune-related pathways such as IFN-α response, IFN-γ response, and TNFα signaling via NFκB.Additionally, the predominant pathways associated with GSEA enrichment encompass biological oxidations, phase I functionalization of compounds, and various cell cycle processes like checkpoints and mitotic events (Figure 2g-i).The GO-KEGG analysis results for DEGs in GSE40367 and the CNDP1 gene set enrichment are concisely presented in Table 2.
We conducted an in-depth analysis concerning the expression of CNDP1 in HCC (LIHC) and different stages of HCC progression.Our findings disclosed a marked diminution in the expression levels of CNDP1 across 369 HCC specimens compared to 160 normal hepatic tissues (Figure 3f).Furthermore, it was observed that the expression of CNDP1 was preeminent during clinical stage I, diminished profoundly by clinical stage IV, and exhibited intermediate levels in clinical stages II and III (Figure 3e).The HPA database, an amalgamation of proteomic, transcriptomic, and systems biology data, corroborates these findings, indicating a reduced expression of CNDP1 protein in HCC relative to normal liver tissue (Figure 3g and h).

The subcellular location, function, and structural analysis
According to the data sourced from the HPA and GeneCards databases, CNDP1 functions as both a secreted and intracellular protein, localized externally to cell membranes.This protein is distinguished by its capacity to catalyze the hydrolysis of the Xaa-His dipeptide via peptide bonds.It exhibits pronounced enzymatic activity toward carnosine (β-propionyl-Lhistidine) and homocarnosine (β-propionyl-3-methyl-histidine) [11,24].In its role as a catalyst, CNDP1 employs Zn 2+ as a cofactor, coordinating two Zn 2+ ions per subunit at the binding sites His132, Asp165, Glu200, and Asp228.Activation of CNDP1 is achieved through the binding of cadmium ions at residues Asp134 and Glu199, though it is impeded by the metal chelator 1,10o-phenantrolin, which exhibits an inhibitory concentration of 50% (IC 50 ) at 5 µM.The kinetic properties of CNDP1 are delineated in Table 3 [11,24].Furthermore, CNDP1 presents two antibody-binding domains, specifically spanning residues 32-133 and 256-334.The protein is composed of 507 amino acids, the initial 26 of which constitute the signal peptide.

Immune cell infiltration in HCC
The tumor microenvironment constitutes an intricate, integrated system engendered by the interaction between neoplastic cells and the adjacent tissues and immune constituents.This milieu augments the proliferative, migratory, and immune evasion capabilities of the tumor cells, thus facilitating the onset and advancement of neoplastic conditions.Tumor-infiltrating lymphocytes have been identified as independent prognosticators of sentinel lymph node status and survival rates in oncological patients.Furthermore, analyses of immune infiltration have elucidated a correlation between CNDP1 expression and the level of immune infiltration in HCC.The expression of CNDP1 exhibited a negative correlation with various immune cells: CD4 + T cells, encompassing CD4 + T memory cells, CD4 + T memory activated cells, CD4 + T central memory cells, CD4 + Th1 cells, CD4 + Th2 cells, CD4 + T (non-regulatory) cells; CD8 + T cells; T regulatory cells; B cell, including B memory cells; myeloid derived suppressor cells (MDSC); mast cells; myeloid dendritic cells; monocyte; and common lymphoid progenitor cells.Conversely, the expression of CNDP1 was positively correlated with macrophage, including macrophage M1 cells and macrophage M2 cells; neutrophil; endothelial cells; granulocyte-monocyte progenitor; hematopoietic stem cell; and cancer-associated fibroblast (Figure 5).P < 0.05 was considered statistically significant.

Immunological landscape in pan-cancer
In this study, we employed the EPIC and CIBERSORT algorithms to explore the potential correlation between immune cell infiltration levels and CNDP1 expression across various cancer types in the TCGA dataset.The findings from the EPIC algorithms (Figure 6a) revealed a significant negative association between B cell immune infiltration and CNDP1 expression in thymoma (THYM), testicular germ cell tumors, stomach adenocarcinoma (STAD), LUSC, LUAD, and HNSC.Moreover, a positive correlation was observed between CD4 T cell infiltration and CNDP1 expression in STAD and GBM, while a negative correlation was noted between CD8 T cell infiltration and CNDP1 expression in LUSC and LUAD.The results obtained from the CIBERSORT algorithms are presented in Figure 6b.
We then extended our research to assess the copy number variations (CNV) for the CNDP1 gene in a pan-cancer context (Figure 6c).The findings disclosed a pronounced incidence of CNV in cancers such as KICH, THCA, GBM, and UCEC.Additionally, our examination of the relationship between CNV and CNDP1 expression in a pan-cancer overview (Figure 6d) revealed a distinct positive correlation between CNV levels and CNDP1 expression in multiple cancer types including bladder urothelial carcinoma (BLCA), breast invasive carcinoma, CHOL, colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), HNSC, KICH, KIRC, KIRP, LUAD, LUSC, pancreatic adenocarcinoma (PAAD), PRAD, rectum adenocarcinoma (READ), and STAD.In contrast, GBM and HCC (LIHC) exhibited an inverse correlation between CNV and CNDP1 expression levels.
To explore the potential significance of CNDP1 in predicting the effectiveness of immune checkpoint inhibitor treatment, we examined the relationship between CNDP1 expression levels and TMB, a well-established biomarker for immunotherapy prediction.Our findings revealed a positive correlation between CNDP1 expressions in BLCA, CHOL, COAD, ESCA, HNSC, KIRP, acute myeloid leukemia, and LUAD with TMB values.Conversely, we observed a negative correlation between CNDP1 expressions in KICH, LIHC, ovarian serous cystadenocarcinoma (OV), PAAD, PRAD, READ, SKCM, and UCEC with TMB (Figure 6e and f).Furthermore, elevated promoter methylation levels of CNDP1 were identified in GBM compared to other cancer types (Figure 6g).Additionally, the analysis of tumor and normal promoter methylation of CNDP1 across various cancers indicated a negative correlation between CNDP1 expression and CHOL, GBM, and PRAD; as well as a positive correlation with BLCA, cervical squamous cell carcinoma, endocervical adenocarcinoma, ESCA, KICH, KIRP, LIHC, PAAD, and UCEC (Figure 6h).

Prognostic analysis
The tumor data obtained from TCGA were classified into low and high expression groups based on the level of CNDP1 expression (median).We conducted an analysis to investigate the impact of CNDP1 expression on patient prognosis across various types of cancer.Our findings from forest plots revealed that a low level of CNDP1 was associated with a better OS outcome specifically in uterine carcinosarcoma (UCS), KIRP, and KIRC, while high expression of CNDP1 was linked to poorer OS in THCA (Figure 7a).Patients with low CNDP1 expression had favorable DFS in UCS and SARC.Conversely, patients with high CNDP1 expression had unfavorable DFS in PRAD and PAAD (Figure 7b).Furthermore, low expression of CNDP1 was associated with worse DSS in UCS, KIRP, and KIRC, but high expression of CNDP1 was associated with THCA (Figure 7c).Low expression of CNDP1 indicated better PFI in UCS and KIRC, while high expression showed worse PFI in THYM (Figure 7d).Additionally, low expression of CNDP1 was linked to favorable DFI in UCS and SARC, while a high level of CNDP1 was associated with a worse DFI outcome for patients (Figure 7e).
We further explored the correlation between CNDP1 expression and clinical outcomes in eight cancers, where notable differences in CNDP1 levels between cancerous tissues and their normal counterparts were observed, employing OS analysis (Figure 7f-m).The survival analysis disclosed that in KIRC, the disparity in survival between the groups with high and low CNDP1 expression was statistically significant (P = 0.012), with patients exhibiting elevated CNDP1 levels experiencing superior OS compared to their low-expression counterparts.In LIHC (HCC), the survival curves of the two groups were markedly distinct, with those in the high CNDP1 expression group achieving greater OS during an 80-month observation period (P = 0.073).These findings imply that diminished CNDP1 expression correlates strongly with adverse prognoses in HCC patients, positioning CNDP1 as a potentially valuable prognostic biomarker in HCC.
We delved further into the correlation between the variation in CNDP1 expression and the prognostic outcomes in HCC patients across varying tumor microenvironments (Figure 7n-s).

Prediction model for microvascular invasion (MVI)
A nomogram was devised to forecast the likelihood of MVI, utilizing preoperative data.This model assessed tumorspecific parameters, including tumor size and count, in addition to the presence of liver cirrhosis, through preoperative diagnostic imaging.Serum samples were obtained from 75 HCC patients enrolled in this study, and the levels of serum CNDP1 were quantified by ELISA.The findings indicated a markedly lower level of serum CNDP1 in the MVI group compared to the non-MVI group (P = 0.011) (Figure 8a).Therefore, in this study, we incorporated CNDP1 into the subsequent stage of univariate logistic regression analysis to assess its prognostic significance for MVI in patients with HCC.Univariate logistic analysis demonstrated significant associations between cirrhosis (P = 0.085), tumor number (P < 0.001), tumor size (P = 0.001), and serum CNDP1 level (P = 0.007) with MVI occurrence (Table 4).Multivariate logistic regression analysis incorporating the above four variables identified that cirrhosis (P = 0.050), multiple tumors (P = 0.002), tumor size (P = 0.007), and CNDP1 level (P = 0.021) were independent predictors of MVI (Table 5).Thus, the final logistic regression equation included these four factors: Y = 2.623 × cirrhosis + 2.278 × tumor number + 1.752 × tumor diameter − 0.010 × serum CNDP1 concentration − 2.789.A nomogram was subsequently constructed based on this model (Figure 8b), facilitating a visual prediction of the risk of MVI prior to surgical interventions in individuals diagnosed with HCC.This nomogram exhibited exemplary predictive accuracy for assessing the risk of MVI, evidenced by a concordance index of 0.867 [95% CI (0.784-0.949)] (Figure 8c).Calibration plots convincingly demonstrated a satisfactory concordance between the predicted risk by the nomogram and actual MVI estimates (Figure 8d).The decision curve associated with the nomogram is illustrated in Figure 8e.

Discussion
Gene expression profiling using microarray technology serves as a quintessential instrument for demystifying the intrinsic mechanisms of diseases and pinpointing genes and pathways associated with various malignancies, which may have remained undiscovered.This methodology sheds light on the molecular underpinnings of cancer and delineates potential avenues for therapeutic endeavors [25].In order to identify potential prognostic biomarkers associated with the development of pan-cancer, especially HCC, we conducted an analysis of genomic microarray data from GSE40367.Our scrutiny led to the identification of 2,248 DEGs, comprising 1,121 up-regulated and 1,127 down-regulated genes.Gene bioinformatics offers a potential molecular targeting approach for the prevention and management of HCC.We executed a comprehensive series of enrichment analyses including GO, KEGG, and GSEA.The findings suggest that the DEGs in GSE40367may be linked to pathways related to biological oxidations, cell cycle, and fatty acid degradation when compared to the control group.These results align with previous research studies [26][27][28].
In our literature search, we found no publications that have conducted pan-cancer analyses of CNDP1 from a comprehensive tumor perspective.Therefore, we conducted a thorough examination of the CNDP1 in pan-cancer based on data from TCGA, GEO, HPA, and UniProt databases.This examination included an analysis of molecular characteristics such as gene expression, promoter methylation, biological function, and protein structure.It was observed that CNDP1 is generally expressed at low levels in most tumors, including HCC.Additionally, it was noted that CNDP1 is both a secreted and intracellular protein located outside the cell membranes.It is recognized for its ability to catalyze the hydrolysis of Xaa-His dipeptide by peptide bonds.
Tumor-infiltrating immune cells, pivotal elements within the tumor microenvironment, are intimately linked with the genesis, progression, and dissemination of cancer [29,30].Cancer-related fibroblasts located within the stromal region of the tumor microenvironment have been documented to play a role in influencing the activity of diverse immune cells that infiltrate the tumor [31].In this research, we employed CIBERSORT, EPIC, and other algorithms to explore the potential interplay between varying levels of immune cell infiltration and CNDP1 expression across diverse cancer types in the TCGA.We discerned a statistically significant inverse relationship between B-cell immune infiltration, as determined by the EPIC algorithm, and CNDP1 expression in six distinct tumors, including THYM.Additionally, our scrutiny of immune cell infiltration in HCC disclosed a negative association between CNDP1 expression and the presence of B cells, CD4 + T cells, CD8 + T cells, T regulatory cells, DMSC, mast cells, myeloid dendritic cells, monocytes, and common lymphoid progenitor cells.These insights imply that CNDP1 may play a role in forecasting patient outcomes in HCC and other tumors.Specifically, elevated levels of CNDP1 expression might correlate with a more favorable prognosis for patients.Deletion or duplication of genome fragments larger than 1 kb resulting from genome rearrangement is called CNV.Genome CNV is part of normal human genetic variation [32][33][34].TMB is crucial as biomarkers in the prediction of tumor immunotherapy outcomes and functions as important immune regulatory elements [35].DNA methylation, an epigenetic mechanism, profoundly influences gene transcription [36].Consequently, in this study, we investigated the relationship between CNDP1 expression and CNV, TMB, and promoter methylation across various cancers.Our findings indicate that the interplay between CNV, TMB, promoter methylation, and CNDP1 expression differs across cancer types.In HCC, a negative correlation exists between CNV and TMB, while a positive correlation is observed between promoter methylation and CNDP1 expression.These observations suggest that the prognostic value of CNDP1 varies among different cancers, and elevated expression of CNDP1 in HCC may signify a more favorable prognosis.
However, survival prognostic analysis of the CNDP1 resulted in different conclusions for different tumors.Our research revealed a notable correlation between reduced levels of CNDP1 expression and unfavorable outcomes in patients with HCC.Moreover, elevated levels of CD4 + Th2 cells, mast cells, hematopoietic stem cells, and MDSC were correlated with an unfavorable prognosis in patients.These observations indicates that CNDP1 may act as a crucial prognostic biomarker in HCC.
Relevant research indicates that MVI serves as the initial stage in the progression of HCC, ultimately leading to intrahepatic tumor spreading or systemic metastases [37,38].MVI has been acknowledged as a pivotal prognostic indicator subsequent to hepatic resection for HCC [39][40][41][42].Following the outcomes of multivariate logistic regression analysis, we developed a logistic regression equation incorporating cirrhosis, tumor number, tumor diameter, and serum CNDP1 level.Subsequently, we constructed a nomogram for predicting MVI, which enhanced the interpretability and convenience of the prediction model for clinicians.The findings corroborate that CNDP1 serves as an independent predictor of MVI in patients with HCC.

Conclusion
In conclusion, our research demonstrates that CNDP1 is consistently underexpressed across various cancer types, including HCC.Moreover, diminished levels of CNDP1 correlate with adverse outcomes in these malignancies.Our analysis further disclosed that the enrichment pathway of DEGs in GSE40367, along with associated immune infiltration in pan-cancer, has been elucidated.These insights may enhance our comprehension of the molecular underpinnings and furnish clinically pertinent molecular targets for prognostication in both pan-cancer and HCC contexts.In essence, this study offers invaluable support in pinpointing critical prognostic biomarkers for both pan-cancer and HCC.

Figure 1 :
Figure 1: Transcriptome analysis of GSE40367 microarray.(a) UMAP score plot of HCC group and normal liver tissue in GSE40367.(b) Volcano plot of DEGs in GSE40367.Red dots indicate high-expressed genes, blue dots indicate low-expressed genes, and gray dots indicate genes that are undifferentially expressed according to P < 0.05 and |log2FC| > 1.0.(c) Heatmap of DEGs shows hierarchical cluster analysis of gene transcriptional changes in two groups.HCC, hepatocellular carcinoma; NLT, normal liver tissue.(d) PPI network of top 50 up-regulated targets and the top 50 downregulated targets according to the STRING database.The network's nodes symbolize proteins, the edges represent protein-protein associations.(e)The protein interaction network derived from the PPI analysis using the cytoNCA plugin in Cytoscape, features circular shapes representing the proteins and lines delineating the interactions between them.

Figure 2 :
Figure 2: Enrichment pathway analysis of DEGs in GSE40367 microarray.(a)-(c) Bubble plot of GO enrichment analysis results.X-axis represents the percentage of genes associated with each functional term.Y-axis represents the annotated terms of gene enrichment.The size of each circle corresponds to the number of genes, with larger circles representing a greater gene count.The color of the circles reflects the adjusted P value.(d)-(f) Bar, chord, and bubble plot of KEGG enrichment analysis results.In the bar plot, X-axis represents the ES of protein enrichment, Y-axis represents the top 10 pathways ranked by the degree value of KEGG signaling pathways.In bubble plot, X-axis represents the percentage of genes associated with each functional term.Y-axis represents the annotated terms of gene enrichment.The size of each circle corresponds to the number of genes, with larger circles indicating a greater gene count.The color of the circles reflects the adjusted P value.In the chord plot, the line segments connect genes and enrichment pathways, with different colors representing distinct enrichment pathways.The size of the circles corresponds to the number of connected line segments; larger circles indicate a greater number of connected genes and pathways.Yellow circles represent pathways, other circles represent genes.(g)-(i) GSEA analysis of GSE40367.(g) Functional analysis of CNDP1 in pan-cancer.The heat map utilizes color-coding to visually represent the degree of gene enrichment, red signifies higher ESs and blue signifies lower ESs.Each row corresponds to a distinct gene set, and each column corresponds to a distinct type of cancer.(h) GSEA visual analysis shows the concentration of gene sets in the sorted list.(i) GSEA ridgeplot illustrates how the ES of a gene set changes with the sorting of the gene list.The X-axis represents the sequenced gene list, while the Y-axis represents the ES.The curve (or "mountain") in the figure demonstrates how the ES of a particular gene set changes cumulatively as the list of genes is traversed.The peak ES indicates the region of enrichment for the gene set in the list.DEGs, differentially expressed genes; ES, enrichment score.

Figure 3 :
Figure 3: Differential expression of CNDP1.(a) CNDP1 gene expression in different cancer types from TIMER2.0.Red dots represent the tumor group, blue dots represent the normal control group, and for cancer species with a control group, the background color is shown as gray.The statistical significance calculated by the Wilcoxon test is displayed above the bar chart.(b) CNDP1 gene expression in different cancer types from GEPIA2.The red columns represent the tumor group and the blue columns represent the normal control group.(c) CNDP1 gene expression in different cancer types from TCGA.The red areas represent the tumor group and the yellow areas represent the normal control group.(d) CNDP1 gene expression in different cancer types from TCGA.The red dots represent the tumor group and the blue dots represent the normal control group.(e) CNDP1 gene expression in different stages of HCC progression, derived from GEPIA2.(f) CNDP1 gene expression in HCC tissues (LIHC) and normal liver tissues, derived from GEPIA2.(g) and (h) Expression of CNDP1 in HCC tissues and normal liver tissues at protein expression level, derived from the HPA database.CNDP1, carnosine dipeptidase 1; HCC, hepatocellular carcinoma (*, P < 0.05; **, P < 0.01; ***, P < 0.001).

Figure 5 :
Figure 5: Correlation analysis between CNDP1 expression and immune cell type in HCC.(a)-(x) Correlation analysis between CNDP1 expression and different types of immune infiltrating cells in HCC.The Spearman correlation coefficient was utilized to calculate the relationship.CNDP1, carnosine dipeptidase 1; HCC, hepatocellular carcinoma.P < 0.05 and correlation coefficient (R) > 0 indicates a positive correlation; P < 0.05 and R < 0 indicates a negative correlation.

Figure 6 :
Figure 6: Immunological landscape in pan-cancer.(a) Analysis of immune cell infiltration levels and CNDP1 expression from EPIC is presented.(b) Analysis of immune cell infiltration levels and CNDP1 expression from CIBERSORT is provided.(c) The bar chart illustrates the rate of gene CNV for CNDP1 in pan-cancer.(d) The bubble diagram demonstrates the relationship between CNDP1 expression and CNV in pan-cancer.(e) The bubble map displays the correlation between CNDP1 expression and TMB in pan-cancer.(f) The radar map showcases the association between CNDP1 expression and TMB in pan-carcinoma.(g) The bubble map shows differences in promoter methylation of CNDP1 between tumor and normal cells in pan-cancer.(h) The bubble plot illustrates the correlation between promoter methylation and expression of CNDP1 in pan-cancer.CNDP1, carnosine dipeptidase 1; CNV, copy number variation; TMB, tumor mutational burden.

Figure 8 :
Figure 8: Prediction model to estimate the risk of MVI presence preoperatively in HCC.(a) Serum CNDP1 level in the MVI group exhibited a significant decrease.(b) Nomogram to estimate the risk of MVI presence preoperatively in HCC.The nomogram incorporates cirrhosis, tumor number, tumor diameter, and serum CNDP1 level.(c) ROC curve based on the prediction model and other indicators in the model (n = 75).(d) Calibration curve based on the prediction model (n = 75).(e) Decision curve based on the prediction model (n = 75).MVI, microvascular invasion; HCC, hepatocellular carcinoma; CNDP1, carnosine dipeptidase 1. P < 0.05 was considered statistically significant.

Table 1 :
Top 20 genes with different changes AveExpr: the average expression of the gene in all samples; logFC: takes log2 for HCC/normal NLT.HCC, hepatocellular carcinoma; NLT, normal liver tissue.

Table 2 :
Top five GO-KEGG enrichment analysis results GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; BP, biological process; CC, cellular component; MF, molecular function.

Table 5 :
Multivariate logistic analysis results of the patients OR, odds ratio; CI, confidence interval; CNDP1, carnosine dipeptidase 1. a Unstandardized β coefficients were calculated from the multivariate logistic regression model.*Indicates P < 0.05.